首页> 外文OA文献 >An investigation into feature construction to assist word sense disambiguation
【2h】

An investigation into feature construction to assist word sense disambiguation

机译:对特征构造进行调查以辅助词义消歧

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.
机译:对于自然语言处理中的许多任务(机器翻译就是一个例子),在上下文中识别单词的正确含义至关重要。词义消歧(WSD)的最新方法使用通常可捕获浅层语言信息的手工制作功能来构建模型。由于所使用的基于特征的建模技术的局限性,通常不使用或以专门的方式使用复杂的背景知识(例如语义关系)。另一方面,使用归纳逻辑编程(ILP)系统的经验结果反复表明,在构建模型时,它们可以使用各种背景知识。在本文中,我们调查了ILP系统的这种功能是否可以用来提高WSD模型的预测准确性。具体来说,我们研究了使用通用ILP系统作为使用语义,句法和词汇信息构造一组功能的方法的使用。然后,该特征集被本领域中的通用建模技术(支持向量机)用来构建用于预测单词含义的分类器。在我们的调查中,我们研究了应用于单语言和双语WSD任务的功能集构建的一次性解决方案和增量解决方案。单语任务使用SENSEVAL-3和SemEval-2007基准中的32个动词,85个动词和名词(英语);而双语的WSD任务由7个高度歧义的动词组成,这些动词将英语翻译成葡萄牙语。结果令人鼓舞:ILP辅助模型显示出比仅使用浅层特征的模型有实质性的改进。此外,增量特征集构造似乎可以识别较小和较好的特征集。综上所述,结果表明,ILP与多种背景知识的结合使用为在WSD领域取得实质性进展提供了一种途径。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号